Explore the TensorFlow Speech Commands Dataset

This notebook relates to the TensorFlow Speech Commands Dataset. TensorFlow Speech Command dataset is a set of one-second .wav audio files, each containing a single spoken English word. These words are from a small set of commands, and are spoken by a variety of different speakers. It was designed for limited vocabulary speech recognition tasks. This dataset can be obtained for free from the IBM Developer Data Asset Exchange.

In this notebook, we will download the dataset archive from cloud storage, extract it, explore the dataset and import audio samples into our Watson Studio project.

Table of Contents:

0. Prerequisites

Before you run this notebook complete the following steps:

  • Insert a project token
  • Import required packages

Insert a project token

When you import this project from the Watson Studio Gallery, a token should be automatically generated and inserted at the top of this notebook as a code cell such as the one below:

# @hidden_cell
# The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs.
from project_lib import Project
project = Project(project_id='YOUR_PROJECT_ID', project_access_token='YOUR_PROJECT_TOKEN')
pc = project.project_context

If you do not see the cell above, follow these steps to enable the notebook to access the dataset from the project's resources:

  • Click on More -> Insert project token in the top-right menu section

ws-project.mov

  • This should insert a cell at the top of this notebook similar to the example given above.

    If an error is displayed indicating that no project token is defined, follow these instructions.

  • Run the newly inserted cell before proceeding with the notebook execution below

Import required packages

In [2]:
import requests
import os
import tarfile
from pathlib import Path
from urllib.parse import urlparse
import glob
import IPython.display as ipd
from IPython.display import Markdown, display
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls
def printmd(string):
    display(Markdown(string))

1. Download and extract the dataset archive

First, we download the TensorFlow Speech Commands data set archive from the Data Asset Exchange cloud storage and extract the data files.

In [3]:
# Dataset archive location on public cloud storage
fname = 'tensorflow-speech-commands.tar.gz'
url = 'https://dax-cdn.cdn.appdomain.cloud/dax-tensorflow-speech-commands/1.0.1/'
data_path = 'TensorFlow-Speech-Commands'
filenames = ['on/0a7c2a8d_nohash_0.wav', 'off/0ab3b47d_nohash_0.wav', 'up/0a7c2a8d_nohash_0.wav', 'bird/0a7c2a8d_nohash_0.wav', 'bird/0c2ca723_nohash_1.wav',
             'sheila/00f0204f_nohash_1.wav', 'cat/0ab3b47d_nohash_0.wav', 'dog/0b09edd3_nohash_1.wav', 'right/0a7c2a8d_nohash_0.wav',
             'bird/0b77ee66_nohash_0.wav', 'bird/0eb48e10_nohash_1.wav', 'bird/0fa1e7a9_nohash_0.wav', 'bird/1d919a90_nohash_2.wav', 'zero/0c40e715_nohash_0.wav']
download_link = url + fname
r = requests.get(download_link)

Download and extract the dataset archive.

In [4]:
print('Downloading dataset archive {} ...'.format(download_link))

r = requests.get(download_link)

if r.status_code != 200:
    print('Error. Dataset archive download failed.')
else:
    # save the downloaded archive
    print('Saving downloaded archive as {} ...'.format(fname))
    with open(fname, 'wb') as downloaded_file:
        downloaded_file.write(r.content)
    
    if tarfile.is_tarfile(fname):
        # extract the downloaded archive
        print('Extracting downloaded archive ...')
        with tarfile.open(fname, 'r') as tar:
            tar.extractall()
        print('Removing downloaded archive ...')
        Path(fname).unlink()
        print('Done.')
    else:
        print('Error. The downloaded file is not a valid TAR archive.')
    
Downloading dataset archive https://dax-cdn.cdn.appdomain.cloud/dax-tensorflow-speech-commands/1.0.1/tensorflow-speech-commands.tar.gz ...
Saving downloaded archive as tensorflow-speech-commands.tar.gz ...
Extracting downloaded archive ...
Removing downloaded archive ...
Done.

2. Inspect audio samples

In this section, we would like to inspect the TensorFlow Speech Command dataset after download and extraction.

In this dataset, there are 31 audio folders. 20 of the words are core words, while 10 words are auxiliary words that could act as tests for algorithms in ignoring speeches that do not contain triggers. Included along with the 30 words is a collection of background noise audio files. The audio clips were originally collected by Google, and recorded by volunteers in uncontrolled locations around the world.

In [5]:
# Save audio sample labels
labels  = [name for name in os.listdir(data_path) if name not in ['info.txt', 'LICENSE', 'validation_list.txt', 'README.md', 'testing_list.txt'] if os.path.isdir(data_path)]
# Get the folder list
folders = glob.glob(data_path + '/*')
# Number of samples in each category of audio clip
recordings = []
for i in folders:
    if os.path.isdir(i):
        samples = [f for f in os.listdir(i) if f.endswith('.wav') ]
        recordings.append(len(samples))
        
printmd('**Core words and number of samples from audio samples:**')
print([(labels[i], recordings[i]) for i in range(0, len(labels))]) 

Core words and number of samples from audio samples:

[('dog', 1746), ('up', 2375), ('five', 2357), ('seven', 2377), ('off', 2357), ('no', 2375), ('four', 2372), ('eight', 2352), ('wow', 1745), ('_background_noise_', 6), ('down', 2359), ('six', 2369), ('right', 2367), ('tree', 1733), ('marvin', 1746), ('two', 2373), ('happy', 1742), ('three', 2356), ('sheila', 1734), ('zero', 2376), ('cat', 1733), ('nine', 2364), ('on', 2367), ('bird', 1731), ('left', 2353), ('house', 1750), ('yes', 2377), ('stop', 2380), ('bed', 1713), ('go', 2372), ('one', 2370)]

The list above is hard to read or compare the number between different audio sample folders. Let's visualize the audio sample distribution.

In [6]:
# Plot
data = [go.Histogram(x=folders, y=recordings, text='pop')]
trace = go.Bar(
    x=labels,
    y=recordings,
    marker=dict(color = recordings),
    text = recordings,
    textposition='outside'
)
layout = go.Layout(
    title='Number of recordings in given label',
    xaxis = dict(title='Words'),
    yaxis = dict(title='Number of recordings')
)
py.iplot(go.Figure(data=[trace], layout=layout))
In [7]:
# Play audio - sample 1
printmd('**Core word** - ' + filenames[0][0:2] )
printmd('**Speaker** - ' + filenames[0][3:])
ipd.Audio(os.path.join(data_path, filenames[0]))

Core word - on

Speaker - 0a7c2a8d_nohash_0.wav

Out[7]:
In [8]:
# Play audio - sample 2
printmd('**Core word** - ' + filenames[1][0:3] )
printmd('**Speaker** - ' + filenames[1][4:])
ipd.Audio(os.path.join(data_path, filenames[1]))

Core word - off

Speaker - 0ab3b47d_nohash_0.wav

Out[8]:
In [9]:
# Play audio - sample 3
printmd('**Core word** - ' + filenames[2][0:2] )
printmd('**Speaker** - ' + filenames[2][3:])
ipd.Audio(os.path.join(data_path, filenames[2]))

Core word - up

Speaker - 0a7c2a8d_nohash_0.wav

Out[9]:
In [10]:
# Play audio - sample 4
printmd('**Auxillary word** - ' + filenames[3][0:4] )
printmd('**Speaker** - ' + filenames[3][5:])
ipd.Audio(os.path.join(data_path, filenames[3]))

Auxillary word - bird

Speaker - 0a7c2a8d_nohash_0.wav

Out[10]:
In [11]:
# Play audio - sample 5, another bird sound file
printmd('**Auxillary word** - ' + filenames[3][0:4] )
printmd('**Speaker** - ' + filenames[3][6:])
ipd.Audio(os.path.join(data_path, filenames[4]))

Auxillary word - bird

Speaker - a7c2a8d_nohash_0.wav

Out[11]:

3. Add Dataset Files to Watson Studio Project

Next, we add the extracted data files to the Watson Studio project to make them available to the other notebooks.

In [12]:
# Verify that the extracted artifacts are located in the expected location
if not Path(data_path).exists():
    print('Error. The extracted data files are not located in the {} directory.'.format(data_path_.name))
else:
    # Save extracted data file(s) as project assets
    data_asset_count = 0
    for file in filenames:
        # save data file as a data asset in the project
        with open(data_path + '/' + file, 'rb') as f:
            file = file.replace('/', '_')
            print(file)
            file = file.split('.')
            print('Saving as {}.wav to project data asset ...'.format(file[0]))
            project.save_data(file[0] + '.wav', f.read(), set_project_asset=True, overwrite=True)
        data_asset_count = data_asset_count + 1
        # remove the file to free up space
    print('Number of added data assets: {} '.format(data_asset_count))
    print('You are ready to run the other notebooks.')
on_0a7c2a8d_nohash_0.wav
Saving as on_0a7c2a8d_nohash_0.wav to project data asset ...
off_0ab3b47d_nohash_0.wav
Saving as off_0ab3b47d_nohash_0.wav to project data asset ...
up_0a7c2a8d_nohash_0.wav
Saving as up_0a7c2a8d_nohash_0.wav to project data asset ...
bird_0a7c2a8d_nohash_0.wav
Saving as bird_0a7c2a8d_nohash_0.wav to project data asset ...
bird_0c2ca723_nohash_1.wav
Saving as bird_0c2ca723_nohash_1.wav to project data asset ...
sheila_00f0204f_nohash_1.wav
Saving as sheila_00f0204f_nohash_1.wav to project data asset ...
cat_0ab3b47d_nohash_0.wav
Saving as cat_0ab3b47d_nohash_0.wav to project data asset ...
dog_0b09edd3_nohash_1.wav
Saving as dog_0b09edd3_nohash_1.wav to project data asset ...
right_0a7c2a8d_nohash_0.wav
Saving as right_0a7c2a8d_nohash_0.wav to project data asset ...
bird_0b77ee66_nohash_0.wav
Saving as bird_0b77ee66_nohash_0.wav to project data asset ...
bird_0eb48e10_nohash_1.wav
Saving as bird_0eb48e10_nohash_1.wav to project data asset ...
bird_0fa1e7a9_nohash_0.wav
Saving as bird_0fa1e7a9_nohash_0.wav to project data asset ...
bird_1d919a90_nohash_2.wav
Saving as bird_1d919a90_nohash_2.wav to project data asset ...
zero_0c40e715_nohash_0.wav
Saving as zero_0c40e715_nohash_0.wav to project data asset ...
Number of added data assets: 14 
You are ready to run the other notebooks.

Next steps

  • Close this notebook.
  • Open the Part 2 - Dataset Visualization notebook to learn more about the data.

Authors

This notebook was created by the Center for Open-Source Data & AI Technologies.

Copyright © 2020 IBM. This notebook and its source code are released under the terms of the MIT License.

Love this notebook? Don't have an account yet?
Share it with your colleagues and help them discover the power of Watson Studio! Sign Up